136 research outputs found

    Gated networks: an inventory

    Get PDF
    Gated networks are networks that contain gating connections, in which the outputs of at least two neurons are multiplied. Initially, gated networks were used to learn relationships between two input sources, such as pixels from two images. More recently, they have been applied to learning activity recognition or multi-modal representations. The aims of this paper are threefold: 1) to explain the basic computations in gated networks to the non-expert, while adopting a standpoint that insists on their symmetric nature. 2) to serve as a quick reference guide to the recent literature, by providing an inventory of applications of these networks, as well as recent extensions to the basic architecture. 3) to suggest future research directions and applications.Comment: Unpublished manuscript, 17 page

    Policy Search in Continuous Action Domains: an Overview

    Get PDF
    Continuous action policy search is currently the focus of intensive research, driven both by the recent success of deep reinforcement learning algorithms and the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, providing a unified perspective on very different approaches, including also Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but we also outline some factors underlying sample efficiency properties of the various approaches.Comment: Accepted in the Neural Networks Journal (Volume 113, May 2019

    Path Integral Policy Improvement with Covariance Matrix Adaptation

    Full text link
    There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically.Comment: ICML201

    Comparing Semi-Parametric Model Learning Algorithms for Dynamic Model Estimation in Robotics

    Full text link
    Physical modeling of robotic system behavior is the foundation for controlling many robotic mechanisms to a satisfactory degree. Mechanisms are also typically designed in a way that good model accuracy can be achieved with relatively simple models and model identification strategies. If the modeling accuracy using physically based models is not enough or too complex, model-free methods based on machine learning techniques can help. Of particular interest to us was therefore the question to what degree semi-parametric modeling techniques, meaning combinations of physical models with machine learning, increase the modeling accuracy of inverse dynamics models which are typically used in robot control. To this end, we evaluated semi-parametric Gaussian process regression and a novel model-based neural network architecture, and compared their modeling accuracy to a series of naive semi-parametric, parametric-only and non-parametric-only regression methods. The comparison has been carried out on three test scenarios, one involving a real test-bed and two involving simulated scenarios, with the most complex scenario targeting the modeling a simulated robot's inverse dynamics model. We found that in all but one case, semi-parametric Gaussian process regression yields the most accurate models, also with little tuning required for the training procedure

    Smooth Exploration for Robotic Reinforcement Learning

    Get PDF
    Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.Comment: Code: https://github.com/DLR-RM/stable-baselines3/ Training scripts: https://github.com/DLR-RM/rl-baselines3-zoo

    Many regression algorithms, one unified model — A review

    Get PDF
    International audienceRegression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis functions, or a mixture of linear models. Furthermore, we show that the former is a special case of the latter. Our ambition is thus to provide a deep understanding of the relationship between these algorithms, that, despite being derived from very different principles, use a function representation that can be captured within one unified model. Finally, step-by-step derivations of the algorithms from first principles and visualizations of their inner workings allow this article to be used as a tutorial for those new to regression

    Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

    Get PDF
    National audienceLa résolution de problèmes à états et actions continus par l'optimisation de politiques paramétriques est un sujet d'intérêt récent en apprentissage par renforcement. L'algorithme PI2 est un exemple de cette approche, qui bénéficie de fondements mathématiques solides tirés de la commande stochastique optimale et des outils de la théorie de l'estimation statistique. Dans cet article, nous considérons PI2 en tant que membre de la famille plus vaste des méthodes qui partagent le concept de moyenne pondérée par les probabilités pour mettre à jour itérativement des paramètres afin d'optimiser une fonction de coût. Nous comparons PI2 à d'autres membres de la même famille - la " méthode d'entropie croisée " et CMA-ES 1 - au niveau conceptuel et en termes de performance. La comparaison débouche sur la dérivation d'un nouvel algorithme que nous appelons PI2-CMA pour " Path Integral Policy Improvement with Covariance Matrix Adaptation ". Le principal avantage de PI2-CMA est qu'il détermine l'amplitude du bruit d'exploration automatiquement

    Fault-Tolerant Six-DoF Pose Estimation for Tendon-Driven Continuum Mechanisms

    Get PDF
    We propose a fault-tolerant estimation technique for the six-DoF pose of a tendon-driven continuum mechanisms using machine learning. In contrast to previous estimation techniques, no deformation model is required, and the pose prediction is rather performed with polynomial regression. As only a few datapoints are required for the regression, several estimators are trained with structured occlusions of the available sensor information, and clustered into ensembles based on the available sensors. By computing the variance of one ensemble, the uncertainty in the prediction is monitored and, if the variance is above a threshold, sensor loss is detected and handled. Experiments on the humanoid neck of the DLR robot DAVID, demonstrate that the accuracy of the predicted pose is significantly improved, and a reliable prediction can still be performed using only 3 out of 8 sensors

    Sensorimotor impairment and haptic support in microgravity

    Get PDF
    Future space missions envisage human operators teleoperating robotic systems from orbital spacecraft. A potential risk for such missions is the observation that sensorimotor performance deteriorates during spaceflight. This article describes an experiment on sensorimotor performance in two-dimensional manual tracking during different stages of a space mission. We investigated whether there are optimal haptic settings of the human-machine interface for microgravity conditions. Two empirical studies using the same task paradigm with a force feedback joystick with different haptic settings (no haptics, four spring stiffnesses, two motion dampings, three masses) are presented in this paper. (1) A terrestrial control study (N = 20 subjects) with five experimental sessions to explore potential learning effects and interactions with haptic settings. (2) A space experiment (N = 3 cosmonauts) with a pre-mission, three mission sessions on board the ISS (2, 4, and 6 weeks in space), and a post-mission session. Results provide evidence that distorted proprioception significantly affects motion smoothness in the early phase of adaptation to microgravity, while the magnitude of this effect was moderated by cosmonauts' sensorimotor capabilities. Moreover, this sensorimotor impairment can be compensated by providing subtle haptic cues. Specifically, low damping improved tracking smoothness for both motion directions (sagittal and transverse motion plane) and low stiffness improved performance in the transverse motion plane
    corecore